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Abstract. We introduce a new method for testing departure from isotropy of 
points on a sphere based on an enhanced form of the two-point correlation function 
that we named 2pt+. This method uses information from the two extra variables 
that define the vector between two points on a sphere. We show that this is a 
powerful method to test departure from isotropy of a distribution of points on a 
sphere especially when the number of events is small. We apply the method to 
a few examples in astronomy and discuss the relevance for limited datasets, such 
as the case of ultra-high energy cosmic rays. 
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1. Introduction 

Determining the degree of anisotropy in the sky distribution of astronomical objects, 
background radiation fields, and transient events is often key in unveiling the hidden 
nature of an astronomical phenomenon. A number of tests for particular anisotropic 
patterns are available in the literature such as spherical harmonic decomposition, 
tests for specific angular scale fluctuations (multipole-specific tests), 2pt correlation 
function, high-order correlation functions, and so forth. Different types of tests are 
more useful depending on the character of the anisotropy. Some areas of astronomical 
study are characterized by small fluctuations on a largely isotropic background (such 
as the cosmic microwave background, e.g., [T]; the large-scale structure of galaxy 
distributions, e.g., [2 [3]; neutrino astronomy, e.g., [4], and the extragalactic GeV 
gamma-ray background, e.g., [5]), while other areas see deviations from isotropy with 
very sparsely populated datasets (such as the sky distribution of pulsars, e.g., [6]; 
gamma-ray bursts before the Compton Gamma-ray Observatory, e.g., an( i the 

current data on trans-GZK ultra-high energy cosmic rays, e.g., [HIEI])- Here we focus 
on a new method designed for the sparsely sampled type of datasets. This case is 
often challenging as with small datasets false positives are easier to encounter, and 
guarding against them by increasing the test significance when a signal is treated as 
real or at least interesting reduces the power of the tests (an extreme example: the 
single-point dataset is maximally anisotropic, yet fully consistent with isotropy!). 

One way that has been used to overcome these difficulties is to use insight 
and guidance from theoretical expectations to define appropriate tests for very 
specific types of anisotropy expected out of a dataset, for example tests that look 
for distributions of datapoints that follow the distribution of known catalogs (e.g., 
[9], [10l [Tl] [T21 [13] ) . However, one would like to define a test that would be "democratic" 
in its treatment of angular scales, "free" of any underlying assumption about the type 
of anisotropy, and as powerful as possible. Clearly not all requirements above can be 
maximized at the same time, so what we propose here is a reasonable compromise. 

In this work we present a statistical test which aims to quantify, in a manner as 
general as possible, how (in)frequently a particular realization of the positions of N 
points on a sphere arises from isotropy (i.e., from an underlying spatial distribution 
with a flat probability density function). The outcome of the proposed test for 
a particular "observed" data set will be a significance, expressing the fraction of 
datasets of N points drawn from an isotropic distribution that returns a value for the 
estimator (test statistic) which is equally or more rare than the "observed' one, in the 
limit that the experiment is repeated — > oo times. If this fraction is very low, then 
the observer can claim either that the "observed" dataset is a very rare realization of 
isotropy, or that the underlying ansatz of isotropy is not valid. 

For the purposes of this test, we quantitatively model isotropy on a sphere as 
follows: when N points are drawn from an isotropic distribution on a sphere, then the 
probability that n points out of the total N lie within the boundaries of an arbitrary 
surface (So) follows the binomial distribution: 

Pn,N=C%p n (l-p) N - n (1) 

where p—-^--^, and Sxot is the surface of the sphere (An for the sphere of radius 1). 

\ In the strict sense of a classical statistics test, the quantity we are calculating is a p-value. 

§ Here defined as the fraction of positive outcomes in the limit of infinite repetitions of the experiment, 

i.e., infinite number of draws of N points. 
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Our aim is to build a test that is comparably sensitive to fluctuations on a 
wide range of angular scales, for N as low as 20 up to a maximum value limited 
by the available computing power (currently, computations for N as high as ~ 100 
are accessible using a desktop computer). To check the sensitivity of the test we use 
mock anisotropic realizations sampled from a small number of equal flux point sources 
distributed isotropically and spherical harmonics with low multipole moment, 1. The 
test is expected to be sensitive in the case of mock maps drawn from a small number 
of point sources (which have power on small angular scales) and also for distributions 
with low multipole moments (which have power on large angular scales). These two 
cases cover the two extremes of fluctuations when classified in angular scale. 

Another commonly used statistical test for small datasets is based on the two- 
point (2pt) correlation function. For this test, the angular distances between pairs of 
points are calculated, and some measure of the deviation of their distribution from 
what is expected from isotropy is used as an estimator. The distribution of angular 
distances between 1225 pairs (N = 50) drawn from an isotropic distribution (test 
repeated for 1000 realizations) is shown in the left panel of Fig. [T] Here, a is the 
angular distance between two points in rad, while the number of pairs for a data set of 
TV events is Ar ^~ 1 - ) . Higher correlation functions can also be calculated, but it is not 
straightforward to extract the independent information contained in these higher order 
terms. Instead, here we build upon the 2pt test, by considering two new variables in 
addition to a, which are also related to pairs of points but are uncorrelated with a. 
The additional information used in this 2pt test motivates the name 2pt+. 




a cos(a) 



Figure 1. The distribution of a (left panel) and cos a (right panel) in 1000 
realizations of 50 events (1225 pairs) drawn from isotropy. The values of cos a are 
consistent with originating from a uniform distribution. 
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2. Basic concept of the 2pt+ test 

Fig. [2] schematically depicts the definitions of the variables that we use in this work. 
The left panel shows the vector between two points on a sphere, subtending an angle 
a, which we aim to describe using three independent variables. There is an intrinsic 
degeneracy in the definition of this vector, as either one of the points in the pair can 
be regarded as the vector origin. In this work, we always choose the point of origin 
so that the z— component of the vector is positive. The vector between two events 
translated to the origin of coordinates, can be described using the following 3 variables 
as in the right panel of Fig. O 

(i) cos a, which is a measure of the length of the vector; 

(ii) cos (3, which is the cosine of the vector's polar angle; and 
(hi) 7, which is the vector's azimuthal angle. 

The distribution of the values of these variables for 1000 realizations of 50 events and 
1225 pairs drawn from an underlying isotropic spatial distribution is shown in the 
right panel of Fig. [T] and in Fig. [3l These plots show that the frequency distribution 
of the three variables is consistent, to the accuracy of our testing, with being uniform 
(equal fraction of draws for all bin values). The two new variables introduced here in 
addition to the angular distance between pairs (cos /3/y) are sensitive to the orientation 
of the pairs and therefore to features in the sky that have preferential directions, such 
as a plane in the sky, a network of overdense regions, and filamentary structures. 

Fig. [4] shows two-dimensional histograms of the values of our three variables 
in pairs of two, for draws from isotropy. In this way, we visualize any potential 
correlation between each pair of our three variables, when averaged over many 
realizations, which is clearly very small. The lack of correlation between cos a 
and our two additional variables, cos/3 and 7, implies that the two new variables 
encode independent information that we add to the power of the standard two-point 
correlation function by incorporating them into our estimator. In addition, the lack of 
correlation between the two variables indicates that each brings to the test independent 
pieces of information, and the power of the test is maximized by incorporating both. 

However, it should be noted that there is a major difference between cos a and 
(cos/3,7): the set of cos a values for a data set is independent of the reference system 
in which it is calculated, while the values of (cos /3,7) are not. Any reference system 
on the sphere is related to each other through a rotation and any statistical test 
should be invariant under rotations. If we consider a particular data set in the plane 
(cos 0,j) (right panel in Fig. Q|, a rotation is a one-to-one mapping, and neighbouring 
points remain neighbouring (and their angular distances remain unchanged) after 
the transformation. Therefore, to minimize any possible dependence on rotations we 
adopt the following strategy: First, a binned likelihood test is applied to assess the 
flatness of the cos a distribution, and a significance (S a ) is obtained from it. Then, a 
second binned likelihood test is applied to assess the flatness of the distribution in the 
(cos /3,7) plane, obtaining a significance Sp 1 . The two significances are combined using 
Fisher's method. Any remaining dependence on rotations will be quantified below. 

3. Implementation of the 2pt+ test 

To calculate S a and Sp 7 , binned likelihood estimators are used. A data set with N 
events is a collection of N ^ N ^ 1 '> values of cos a and (cos /3,7) . Since the distributions 
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Figure 2. Schematic depiction of the variables used in the 2pt+ test. Left panel: 
angular distance a between two events. Here, the drawn sphere corresponds to 
the observed celestial sphere. The black vectors correspond to vectors extending 
from the origin (observer) to each of the two events. The blue vectors represent 
the two choices for the definition of the vector between events. In this work, we 
always use the vector with a positive 2— component (in the case depicted here, 
the vector pointing from right to left), a is the angle subtended by the two event 
vectors. Right panel: angles /3 and 7. Here, the vectors between events have been 
transported to the origin, and each of the two vectors corresponds to a different 
pair between events. The length of each vector depends on o, so the radii of the 
two spheres drawn here quantify the angular distance between each pair of events. 
The angles /3 and 7 for one of the pairs (the one represented by the blue vector) 
are shown in the figure. 
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Figure 3. The distribution of cos (left panel) and 7 (right panel) in 1000 
realizations of 50 events (1225 pairs) drawn from isotropy. The values of both 
quantities are consistent with originating from uniform distributions. 
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Figure 4. Histograms of coso-cos/3, cosct-7 and cos /3-7 values for draws from 
isotropy. No apparent pattern or correlation between any of the variable pairs is 
seen. 



are flat, if we divide the phase space into equal parts we expect on average the same 
number of events in each one. In the case of cos a we divide the phase space (-1,1) 
into equal parts in such a way that the expected number of events for each bin is fi 
(thus, the number of bins Nt,i ns , a is the integer closest to N )■ A similar approach 
is used for the (cos /3,7) plane, but since this is a two dimensional distribution, the 

number of bins in each axis is the integer closest to Nuns,^ — \J ' N ^2/7 
The two pseudo-likelihood estimators are defined as follows: 

N HnS:a 

L a = n p «'° bs ^) 

i=0 

Nbina,f3-f 

Lfh = n p ( n T bs ^) (2) 

where P(n° ! '' obs , /i) (P{n^' obs , /i)) is the Poisson distribution with mean /i and n"' obs 

( n jk'° hS ) observed number of pairs in the i th cost* bin (j th k th (cos/3,7) bin). We 
have verified through Monte Carlo simulations that the distribution in each bin is 
sufficiently similar to Poisson. 

Fig. [5] shows the distribution of the two estimators for an isotropic distribution 
with iV=50 and /i=5. It should be noted that the number of factors entering the 
calculation of L a and (see Eq. [5J are the same. Therefore, if each one is 
completely independent from each other, the distribution of the two estimators should 
be the same. This is not the case, reflecting correlations introduced when considering 
event pairs as the observable instead of the individual points on the sphere. The small 
correlations can be easily corrected for as we discuss below (see Fig. |6]). 

To calculate the significance for a given realization, we integrate the normalized 
distributions in Fig. [3 obtaining S a and Spy. If these significances quantify truly 
independent experiments, the combined significance can be calculated analytically 
using Fisher's method. The method suggests that if 2 independent tests, each 
yielding a significance (p- value), S a and S' j g 7 , are combined, then the variable 
— 2(lnS' Q + InSfjy) is distributed as chi-square with 4 degrees of freedom, and the 
combined significance can be obtained by integrating under the tail of this distribution 
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Figure 5. The distribution of the estimators L a (solid line) and L^ 7 (dotted 
line), for event draws from an isotropic distribution. If the two estimators were 
truly independent, the distributions would be identical. Small correlations are 
introduced by considering pairs instead of events on the sphere, causing the 
deviation of their Fisher-combined significance from a random variable uniformly 
distributed between and 1, as seen in Fig. [6] 



above its "observed" value defined by the two individually measured significances. The 
probability distribution function (PDF) of the chi-square with 4 degrees of freedom, 
if the random variable is Y, is /(Y, 4) = Yexp[— Y/2]/4, and the significance for an 
observed value Yq (i-e., the PDF integral for Y values above Yq) is 

Sy = jT° f(Y, A)dY = (l + y) exp[- V2] ■ (3) 
Substituting Yq = — 2(ln5 Q + hxSp^) we get 

5'combined = S a S/3j (1 — In SaSpy) . (4) 

This is simply the probability that the product of two random variables uniformly 
distributed between (0,1) is smaller or equal than the observed product. 

The left panel in Fig. [6]shows the distribution of logioS'combined (solid histogram) 
together with the distribution of logiow (dotted histogram), where u is a random 
number between (0,1). Deviations for significances smaller than 10 are apparent 
and caused by a small correlation between L a and Lp 1 . The combined significance 
^combined is then corrected using Monte Carlo calculations. The distribution of the 
corrected significance, S™" bine d , is shown in the right panel of Fig. and is indeed 
consistent with a uniform distribution. 
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Figure 6. Left panel: combined significances obtained with Eq. [4] (solid line) 
compared with the expected distribution if the two estimators are completely 
independent (dotted line). Right panel: the distribution of S£°Jjl[ b j d for 50 events 
and fi = 5. 



3.1. The selection of bin size (\x) 

The least straight-forward part in a binned likelihood is setting the value of /i to be 
used, i.e., the bin size. If we have N random numbers distributed between and 1, the 
maximum number of independent questions to be asked is N. Therefore, the number 
of bins should be smaller than N, i.e., n (the expectation value for points per bin) 
should be equal to or larger than 1. In the present case, since ./V is the number of 
pairs and enhanced correlations are present, \i should be clearly larger than 1. 

A binned likelihood test is only sensitive to fluctuations at scales larger than 
the bin size. That is the reason why the bin size should be kept as small as 
possible when there is no a priori reason setting a lower-limit to its size. However, 
the sensitivity to fluctuations at increasingly larger scales decreases because the 
information about coherent fluctuations of bins is lost in the binned likelihood method. 
A possible approach is to modify the binned likelihood to include the information 
about correlations between bins. However, in this work we adopt a more conservative 
approach: we keep the bin size as low as possible and consider the resulting sensitivity 
as a lower bound to what can in principle be achieved. 

To select the minimum reasonable \i, we generate mock anisotropic realizations 
of 50 events sampled from: a) n ps point sources isotropically distributed, and b) 
multipoles with low values of 1. Each case is evaluated 5000 times and each time S a , 
S/s-y and S'comfeined are calculated. We use the fraction of realizations with a significance 
smaller than 10~ 4 (hereafter F4) as a measure of the sensitivity to each case. 

Fig. [7] shows F 4 for two cases with N — 50: the solid line is for 20 equal flux 
point sources distributed isotropically with 50% isotropic background (i.e., 25 events 
from point sources and 25 events from the background); while the dashed line shows 
events sampled from a spherical harmonic, Y\ >m , with (l,m)=(2,0) with 20% (10 events) 
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Figure 7. Impact of a Gaussian smearing on mock anisotropy maps. Only L a is 
used to calculate the significances. Results shown are for TV = 50 and fi = 5. The 
solid curve is for 25 events from 20 equal flux point sources isotropically distributed 
over a 50% isotropic background (i.e., 25 isotropic events). The dashed line is for 
a dipolc with 20% (i.e., 10 events) of the events from an isotropic background. 



from an isotropic background. Each event is smeared with a Gaussian distribution of 
variable width indicated in the x axis. This smearing is representative of the angular 
resolution of the experiment (or physical smearing, e.g., due to lensing). The value of 
i<4 is calculated using only S a . It is clear from the multipole case that if the anisotropy 
manifests itself at angular scales larger than the smearing angle, the sensitivity of the 
test does depend on the smearing. The dependence is strong in the case of point 
sources. Realizations of isotropy would correspond to a flat line with i 7 4=10 -4 , i.e., 
outside of the range shown. The value of \i used is 5, showing that for this particular 
value we are sensitive to fluctuations at very different angular scales. 

The left panel in Fig.[H]shows F4 as a function of the smearing angle for 3 choices of 
fi (3, 5, and 10), where only S a is used to calculate -F4. The mock samples correspond 
to 20 point sources and 50% isotropic background. The trend is clear: for larger 
smearing angles, larger values of /x lead to increasing sensitivities. 

The right panel in Fig. [8] shows F4 calculated with S a as a function of \i for 
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Figure 8. The selection of {i (see text) to calculate L a . Results shown are for 
N = 50. Left panel shows F4 from S a as a function of smearing angle of 20 point 
sources for fi= 3 (solid), 5 (dashed), and 10 (dotted). Right panel shows F4 from 
S a as a function of fi for spherical harmonics with (l,m)=(l,0) solid, (2,0) dashed, 
and (4,0) dotted lines. 



realizations sampled from multipoles with different values of (l,m). (In the figure 
(l,m) is (1,0) for the solid line, (2,0) for the dashed line and (4,0) for the dotted line 
respectively). The sensitivity increases for values of [i between 3 and 10 and is roughly 
constant after that. 

Fig. [5] shows -F4 as a function /x for realizations sampled from the same multipoles 
as in the right panel of Fig. [8] This time Sp 1 is used to calculate F4. The dependence 
on jj, is also mild above 5. 

After this exercise, it is clear that fi—5 is a rather conservative value to be used 
either for cos a or (cos f3, 7), and it is the one used in this work from now on. We are 
currently developing an unbinned likelihood estimator that bypasses the problem of 
selecting fi. The results of this analysis will be reported in a future publication. 

3. 2. Impact of a change of coordinates 

As discussed above, the set of (cos /3, 7) values depends on the coordinate system used 
to evaluate them. To check the sensitivity of the test to the coordinate system, we 
calculate Sp 7 using a reference coordinate system Sp and another, randomly chosen 
coordinate system Sp. Fig. [10] shows the result for realizations sampled from a 
multipole with (l,m)=(2,0). The two significances are not numerically equal, however 
they show excellent correlation with a spread in ~LogioSp-y of ~ 0.22. This spread, 
when propagated to the final significance S^°^ bined is further reduced (to ~ 0.17). So 
we can conclude that the dependence on the chosen coordinate system in not very 
significant. 
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Figure 9. F4 from S function of fi for spherical harmonics with (l,m)=(l,0) 

solid, (2,0) dashed, and (4,0) dotted lines. Results shown are for N = 50. 



4. The sensitivity of the test 

Fig. QT] shows the sensitivity of the test as a function of the number of events in 
the sphere for 4 anisotropic cases. Three spherical harmonics with 20% isotropic 
background are shown ((l,m) = (1,0) solid, (2,0) dashed, (4,0) dotted lines) and 
the case of 20 point sources with 50% background is shown in the dot-dashed line. 
The evolution of F4 is different for different cases but in all of them there is an 
increase of sensitivity when the number of events is increased. The test seems to show 
good sensitivity for number of events between 25 and 100 with anisotropies at rather 
different angular scales. 

Fig. [12] demonstrates the sensitivity improvement of the 2pt+ test with respect 
to the standard 2pt test, which only considers information encoded in a. In this 
figure we logarithmically plot, for a low-multipole underlying distribution (l,m = 2,0) 
over a 20% isotropic background, the significance using information in a alone, S a , 
as a function of ^combined f° r t ne same realization. The experiment is performed for 
N = 50 and with a bin size fi = 5. We see that S a is in almost all cases higher (worse) 
than (^combined- The 2pt+ test is systematically more sensitive than the 2pt test for 
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Figure 10. Impact of a change of coordinates on the significances calculated using 
L/s-y. Left panel: correlation between Sp 1 calculated in each coordinate system. 
Right panel: distribution of the dispersion of the two significances around the 
diagonal. 



this type of anisotropy. 

Fig. 1131 similarly demonstrates the sensitivity improvement of the 2pt+ over the 
standard 2pt test for a type of anisotropy motivated by recent results from the Pierre 
Auger Observatory on the sky distribution of events with energies above ~ 6 10 19 
eV [3 [10]. The distribution of the 27 highest energy events were found to depart 
from isotropy through a test which shows a correlation of ultra-high energy cosmic 
ray events with nearby active galactic nuclei (AGN) selected from the Veron-Cetty & 
Veron catalog of quasars and active galactic nuclei (12th Ed.) [U] (VCV) with redshift 
z < 0.018 and a Gaussian 3.2 degree smearing. This recent result is one of the most 
prominent cases of sparsely-sampled datasets that need to be tested for compatibility 
with an isotropic distribution on the sky, and a natural application for the 2pt+ test. 

On the left panel of Fig. [13j the 2pt+ and the classic 2pt (here implemented 
using S a alone) are shown for a mock catalog with varying number of events (from 20 
to 60) drawn from the VCV catalog of AGN with the same parameters chosen by the 
Auger results. The right panel shows the same functions for the same catalog plus an 
isotropic background with 50% of the events. The black and blue solid lines correspond 
to the median significance of the 2pt and 2pt+ tests as a function of TV respectively, 
while the bands around these lines correspond to the behavior of ±34% of sets around 
the median. In both cases the test performs better than the classic 2pt test. The 
relative improvement increases with increasing number of events. In addition, in 
the case where the anisotropic signal is added to a 50% isotropic background (lower 
panel), the median significance of the 2pt+ test reaches sub-percent levels above 100 
events, while hardly any overall improvement in significance is seen with the 2pt test: 
for the case of a mild signal over a substantial isotropic background, the 2pt+ test can 
pick up the deviation from isotropy with samples so sparse that the 2pt test would 
show no sensitivity whatsoever. 
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Figure 11. Sensitivity of the test as a function of the number of events for 4 
cases of anisotropics . 



Finally, in Fig. [14] we plot, for N = 60 and for the signal corresponding to the 
lower panel of Fig. [13J a histogram of the significance values returned for different 
realizations of the specific experiment by the 2pt (black line) and the 2pt+ (red line) 
tests: not only is the peak (most frequent) significance of the 2pt+ test translated 
to smaller values compared to the 2pt test, but also the very-low-value tails are 
systematically more populated. 

5. Testing a different hypothesis 

The test can be also used to reject a hypothesis other than isotropy. Quantitatively, 
any hypothesis can be expressed as a probability density function ^J^oJedt^ wnere 
cos# and 4> are the polar angles on the sphere. For a sphere of unit radius and the 
isotropic ansatz the probability density function is: dP J^° s S g^ — j^- If the underlying 
probability density function is not flat, correlations between cos a, cos/3 and 7 would 
appear, complicating the implementation of the test. Our proposed approach to deal 
with non-flat distributions is to apply a change of coordinates to a reference system 
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Figure 12. Improvement in sensitivity by using the 2pt+ test over the classic 2pt 
test (here implemented by using ja alone), for a low-multipole (1=2) anisotropy 
with a 20% isotropic background. Each realization corresponds to N = 50 and 
/i = 5. 



for which 



dP(cos8,<j>) 



For any hypothesis, a point on the sphere (cos6q,4>) is 



dcosOdtf) 4tt ' 

mapped into another position on the sphere with the following polar coordinates: 



cos #'(cos 9o,4>o) 



//(cos O , 



cos9q 



dp 



d cos 6d<j) 



(cos#, <f>o)dcos9 



<t>0 



dp 



dcos 



(cos 9q, (f))d(j) 



(5) 

This transformation has the following nice property: if we have N points on the 
sphere, we can draw N parallels and meridians in such a way that each one passes 
through a point. This determines a grid on the sphere. The transformation proposed 
deforms the grid, changing the area of each cell, but introduces no caustics. This 
transformation effectively separates (brings closer together) the points in overdcnse 
(underdense) regions of the sphere. 

This approach is simple and general, and overcomes the difficulties of dealing with 



The 2pt+: an enhanced 2 point correlation function 



15 




140 160 
#events 



Figure 13. Evolution of the median of the significances and 1 sigma band as a 
fucntion of the number of events for the 2pt+ and differential 2pt. The events are 
drawn from the VCV catalogue of sources with z < 0.018 and 3.2 deg smearing, 
and taking into account the Auger South exposure. Upper panel has no isotropic 
background while a 50% isotropic background is included in the lower panel. 
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Figure 14. Distribution of the significances of the 2pt+ and differential 2pt when 
60 events are drawn from a VCV catalogue of sources with z < 0.018 and 3.2 deg 
smearing angle, summed to 50% isotropic component, and taking into account 
the Auger South exposure. 



each case in particular. We recommend the use of this approach even when the sphere 
is not complete. 

6. Discussion 

We have presented a novel statistical test optimized for testing sparsely sampled 
datasets (with as few as 20 datapoints) for compatibility with an isotropic distribution 
on the sky. The test is an enhanced version of the classic 2pt test, incorporating all 
the information utilized in the 2pt test (the angular distances between events, or, 
equivalently, the length of the vector connecting event pairs), as well as additional, 
independent information about the orientation of each vector connecting event pairs. 
We call this new test the 2pt+ test. 

We propose here a specific method for the implementation of this test, based on 
distinct pseudo-likelihood estimators on a binned parameter space. The individual 
significances from each estimator are then combined using Fisher's method. Because 
of small correlations introduced by working in pair space rather than event space, 
the resulting quantity is not uniformly distributed as required for a true significance, 
so the final result we quote as significance is first corrected (its distribution is first 
rendered uniform) by Monte-Carlo simulations. We additionally discuss a proposed 
methodology for utilizing the 2pt+ test to assess the compatibility of an event set 
with a distribution different from isotropy. 

We have tested the behavior of the test as a function of parameter space binning, 
number of events, and type of anisotropy. Based on these tests, we recommend a 
binning choice of /i = 5. The sensitivity of the test increases with increasing number 
of events, and, depending on the type of anisotropy, the method can saturate to 
significances better than 10 -4 for as few as 60 events. 

We have also compared the sensitivity of the 2pt+ test with that of the 2pt test for 
different types of anisotropy and have found that the additional information encoded 
in the 2pt+ test results in yielding consistently better significances over the 2pt test. 
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